SpeeDO: Parallelizing Stochastic Gradient Descent for Deep Convolutional Neural Network

نویسندگان

  • Zhongyang Zheng
  • Wenrui Jiang
  • Gang Wu
چکیده

Convolutional Neural Networks (CNNs) have achieved breakthrough results on many machine learning tasks. However, training CNNs is computationally intensive. When the size of training data is large and the depth of CNNs is high, as typically required for attaining high classification accuracy, training a model can take days and even weeks. In this work, we propose SpeeDO (for Open DEEP learning System in backward order). SpeeDO uses off-the-shelf hardwares to speed up CNN training, aiming to achieve two goals. First, parallelizing stochastic gradient descent (SGD) on a GPU cluster with off-the-shelf hardwares improves deployability and cost effectiveness. Second, such a widely deployable hardware configuration can serve as a benchmark on which software algorithmic approaches can be evaluated and improved. Our experiments compared representative SGD parallel schemes and identified bottlenecks where overhead can be further reduced.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

T He O Uter P Roduct S Tructure of N Eural N Et - Work D Erivatives

Training methods for neural networks are primarily variants on stochastic gradient descent. Techniques that use (approximate) second-order information are rarely used because of the computational cost and noise associated with those approaches in deep learning contexts. We can show that feedforward and recurrent neural networks exhibit an outer product derivative structure but that convolutiona...

متن کامل

Distributed asynchronous optimization of convolutional neural networks

Recently, deep Convolutional Neural Networks have been shown to outperform Deep Neural Networks for acoustic modelling, producing state-of-the-art accuracy in speech recognition tasks. Convolutional models provide increased model robustness through the usage of pooling invariance and weight sharing across spectrum and time. However, training convolutional models is a very computationally expens...

متن کامل

GoSGD: Distributed Optimization for Deep Learning with Gossip Exchange

We address the issue of speeding up the training of convolutional neural networks by studying a distributed method adapted to stochastic gradient descent. Our parallel optimization setup uses several threads, each applying individual gradient descents on a local variable. We propose a new way of sharing information between different threads based on gossip algorithms that show good consensus co...

متن کامل

Overcoming Challenges in Fixed Point Training of Deep Convolutional Networks

It is known that training deep neural networks, in particular, deep convolutional networks, with aggressively reduced numerical precision is challenging. The stochastic gradient descent algorithm becomes unstable in the presence of noisy gradient updates resulting from arithmetic with limited numeric precision. One of the wellaccepted solutions facilitating the training of low precision fixed p...

متن کامل

Accelerating deep neural network training with inconsistent stochastic gradient descent

Stochastic Gradient Descent (SGD) updates Convolutional Neural Network (CNN) with a noisy gradient computed from a random batch, and each batch evenly updates the network once in an epoch. This model applies the same training effort to each batch, but it overlooks the fact that the gradient variance, induced by Sampling Bias and Intrinsic Image Difference, renders different training dynamics on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015